Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Monaural speech enhancement based on gated dilated convolutional recurrent network
Xinyuan YOU, Heng WANG
Journal of Computer Applications    2024, 44 (4): 1317-1324.   DOI: 10.11772/j.issn.1001-9081.2023040452
Abstract93)   HTML3)    PDF (1791KB)(84)       Save

The use of contextual information plays an important role in speech enhancement tasks. To address the under-utilization problem of global speech, a Gated Dilated Convolutional Recurrent Network (GDCRN) for complex spectral mapping was proposed. GDCRN was composed of an encoder, a Gated Temporal Convolution Module (GTCM) and a decoder. The encoder and decoder had asymmetric network structure. Firstly, features were processed by the encoder using a Gated Dilated Convolution Module (GDCM), which expanded the receptive field. Secondly, longer contextual information was captured and selectively passed through the use of the GTCM. Finally, the deconvolution combined with a Gated Linear Unit (GLU)was used by the decoder, which was connected to the corresponding convolution layer in the encoder using skip connection. Additionally, a Channel Time-Frequency Attention (CTFA) mechanism was introduced. Experimental results show that the proposed network has fewer parameters and shorter training time than other networks such as Temporal Convolutional Neural Network (TCNN) and Gated Convolutional Recurrent Network (GCRN). The proposed GDCRN significantly improves PESQ (Perceptual Evaluation of Speech Quality) and STOI(Short-Time Objective Intelligibility) up by 0.258 9 and 4.67 percentage points, demonstrating that the proposed network has better enhancement effect and stronger generalization ability.

Table and Figures | Reference | Related Articles | Metrics